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Deforestation in Indonesia is in a status that is quite alarming. From year to 
year, deforestation is still happening. The decline in fauna and the 
diminishing biodiversity are greatly affected by deforestation. This paper 
proposes a bioacoustics-based forest quality assessment tool using Nvidia 
Jetson Nano and convolutional neural networks (CNN). The device, named 


GamaDet, is a portable physical product based on the microprocessor and 
equipped with a microphone to record the sounds of birds in the forest and 
display the results of their analysis. In addition, a Google Collaboratory- 
based GamaNet digital product is also proposed. GamaNet requires forest 
recording audio files to be further analyzed into a forest quality index. 
Testing the forest recording for 60 seconds at an arboretum forest showed 
that both products could work well. The GamaDet takes 370 seconds, while 
the GamaNet takes 70 seconds to process the audio data into a forest quality 
index and a list of detected birds. 
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1. INTRODUCTION 

A tropical forest is an ecosystem unit in the tropics in a stretch of land containing biological natural 
resources dominated by trees in their natural environment. This forest is needed, among others, to slow or 
prevent climate change. The forest quality tends to decrease due to natural and human factors, dominated by 
human activity. During the last five years, the average annual area of deforestation is estimated at 10 million 
hectares worldwide. There needs to be a balance between human greed, today’s human needs, and the need to 
provide a healthy environment for living things and be sustainable for future generations. Tropical forests 
need to be protected from damage, especially those caused by human activities. The woods need to be 
monitored and assessed to maintain the sustainability of forest ecosystems. Forest quality assessment aims to 
know the current forest condition, changes, and trends that may occur [1], [2]. 

Forest quality has a close relationship with the health of the ecosystem. A good ecosystem is an 
ecosystem that has balanced components, where each piece interacts and fulfills each other’s needs. The 
bioacoustic index is one way to identify the quality of the ecosystem in a place by monitoring the status and 
trends of bird diversity in a particular location [3]. 
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Birds can be used as indicators of ecosystem changes in a forest. Many birds live in a forest habitat 
and have a high and dynamic level of mobility to quickly respond to changes in their environment [4], [5]. 
Overall, birds can function as the guard and superior species so that birds are suitable to be used as indicators 
of a forest ecosystem. In this case, indicators are defined as organisms closely related to specific 
environmental conditions so that changes in the environment will change the nature or existence of the 
organisms [2]. 

Based on [2], there are at least four things that state birds as environmental indicator species, 
namely: i) distribution, ecology, biology, and life history of birds are well known compared to other fauna; 
ii) poultry in the feed chain occupies high enough position so that they are sensitive to changes in 
environmental pollution; iii) the bird survey technique is simpler; and iv) to monitor them is relatively 
cheaper than monitor other faunas such as reptiles and mammals. 

Traditionally, bird diversity was monitored using point counts. This method requires a particular 
expert to visually and aurally identify and count birds in the field at 5-10-minute intervals at the sampling 
site. The count’s accuracy at these identification points is negatively affected by the detection variability 
between species and sites and weather conditions that cause bird movement and activity. Another drawback 
of this method is that monitoring large areas with high temporal resolution is usually more challenging due to 
logistical and financial constraints. The results of this survey may also be biased by the observer’s level of 
experience [6], [7]. 

In contrast, passive acoustic monitoring using an automatic recording unit to monitor the acoustic 
environment can be carried out continuously with adjustable periods. Data collection using this method is 
more cost-effective and flexible and has been widely used for ecological monitoring over the past decade. 
Acoustic data set recorded into a permanent record with higher temporal resolution than point calculation [7], 
[8]. It allows the researcher to revisit the data to perform additional analyses or manually verify the detected 
signals automatically, reducing bias and uncertainty associated with extracted observations. 

Several artificial intelligence methods have been applied to various applications [9]-[12]. One 
application of artificial intelligence is to recognize and classify acoustic signals is convolutional neural 
networks (CNN). CNN can outperform traditional techniques in bioacoustic classification and detection [13], 
[14]. This paper proposes a CNN-based artificial intelligence model to classify bird species based on sound 
recordings [15], [16]. The system is given additional capabilities in bioacoustic analysis that can assess the 
quality and diversity of forest ecosystems based on the recognition of birds’ voices in the forest [17], [18]. 


2. RESEARCH METHOD 
2.1. Tropical forest quality assessment tool 

The concept of the design of the tropical forest quality assessment tool is portable, inexpensive, and 
easy to use. Therefore, the product is designed using components that can meet these aspects. The Nvidia 
Jetson Nano was chosen as the microprocessor of the system called as GamaDet. The Rode VideoMic Go 
mini-shotgun microphone was used to acquire and record forest sounds and had a frequency range of 
2-8 kHz. Other components such as wifi dongle, sound card, and tripod were chosen by considering the 
aspects of optimality and functionality of the design. In GamaNet, Google Collaboratory was selected as the 
online computing framework. The library used in the design of GamaNet is an open source that can be 
developed further. 

The assessment system of the forest quality index was designed using a python programming 
language. The system processes the bioacoustics of bird sounds. Four parameters of the bioacoustic index are 
used as forest quality parameters, namely: acoustic evenness index (AEI), bioacoustic index (BIO), 
normalized difference soundscape index (NDSI), entropy (H). AEI assumes that each biotic component 
species in a natural ecosystem has a unique frequency and active time that differ from one another [19], [20]. 

This approach method can be used as material for spatiotemporal comparison. BIO is used to 
measure the frequency spectrum range of bird sounds in the range of 2-8 kHz: the more significant the BIO, 
the more dominant the biotic components in the environment. H index is used to measure one animal species 
and all the biotic components of an ecosystem. The greater the H index, the more diverse is the ecosystem. 
The last acoustic index, NDSI, compares the frequencies between anthrophony in 1-2 kHz with biophony in 
2-8 kHz. So that by using NDSI, it can be seen the dominance of human presence in an ecosystem [19]. 

The proposed tropical forest quality assessment tool is shown in Figure 1. The device has two main 
features: analyzing forest quality based on the bioacoustic parameters of bird sounds and the diversity of 
birds in a place [21], [22]. The two are combined to gain new insights regarding ecological quality. We used 
python programming and several libraries such as matplotlib, librosa, and pandas as the basis in the system’s 
coding. In the deployment stage, GamaNet is installed into the Nvidia Jetson Nano [23], [24]. The result of 
this step is that the user can use the functions of the GamaDet tool. These functions include recording sound, 
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data processing, and displaying visualization and can be online accessed from the user’s device. Details of 
the prototype specifications are described in Table 1. GamaNet is also installed in a web-based environment, 
namely on the Google Collaboratory [25]. The deployed GamaNet can be accessed by users using a specific 
internet address. 


@ ~ > Microphone Microcontroller 


j Laptop 


Figure 1. Tropical forest quality assessment tool using Nvidia Jetson Nano and CNN 


Table 1. GamaDet spesification 


No Parameter Specification 
1 Lengthxwidthxheight 11.5x9.5x5 cm (without tripod) 
60x60x120 cm (with tripod) 
2 Data processor Quad-core ARM Cortex-A57 CPU, 4 GB RAM, 128-Core Nvidia Maxwell GPU 
3 Operating System Jetpack 4.6 basis Linux Ubuntu 18.04 
4 Microphone Supercardioid Rode VideoMic Go 
5 Power Supply Power Bank Redmi 20.000 mAh 
6 Connectivity Wifi TP-Link WN725N 150 Mbps 


2.2. Bird sound data collection 

The birds sound dataset consists of two parts. The first dataset is a large dataset consisting of bird 
records from around the world consisting of 984 bird species. Meanwhile, the second dataset consists of 
recordings of birds’ endemic in Java Island. The primary dataset was collected from the Kaggle site, while 
the second dataset was collected from the Xeno-Canto site. 


2.3. Signal processing and machine learning 

We design the machine learning model based on a CNN [26]. CNN-based models are widely used in 
the field of image processing. In this paper, the system firstly processes the audio data into a spectrogram. 
The model with time-domain spectrogram data is then converted to the form of mel-frequency cepstral 
coefficients (MFCC) [27] or mel-spectrogram via the Fourier transform [28], as shown in Figure 2. 
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Figure 2. Comparison of spectrogram with mel-spectrogram [27] 
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One of the CNN models with good performance is the Wide-ResNet model [7]. The model has a 
shallow characteristic so that the model training time becomes shorter without sacrificing the accuracy of the 
standard ResNet model [29]. The architecture of the ResNet model used in this paper is shown in Table 2. 
The training flowchart of the proposed machine learning model is shown in Figure 3. 


Table 2. BirdNet architecture [7] 


Group Name Input shape Output shape 
Pre-processing 5x5 Conv+BN+ReLU (1x64x384) (32x64x384) 
Max Pooling (32x64x384) (32x64 192) 
ResStack 1 Downsampling block (32x64x192) (64x32x96) 
2 xResBlock (64x32x96) (64x32x96) 
ResStack 2 Downsampling block (64x32x96) (128x16x48) 
2 xResBlock (128x16x48) (128x16x48) 
ResStack 3 Downsampling block (128x16x48) (256x8x24) 
2 xResBlock (256x8x24) (256x8x24) 
ResStack 4 Downsampling block (256x8x24) (512x4x12) 
2 xResBlock (512x4x12) (512x412) 
Classification 4x10 Conv+BN + ReLU +DO (512x4x12) (512x1x3) 
1x1 Conv+BN + ReLU + DO (512x1x3) (1024x1x3) 
1x1 Conv+BN + DO (1024x1x3) (987x1x3) 
Global LME Pooling (987x1x3) (987x1) 
Sigmoid activation (987x1) (987x1) 
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Figure 3. Flowchart of the training of the machine learning 


End 


2.4. Machine learning evaluation 

The evaluation phase of this product consists of two stages. At first, the test used validation data. 
Validation data contains ground truth labels to compare the differences in test results with validation data. 
The machine learning model validation data was obtained from [7], while the data for the validation of the 
forest quality index was obtained from [19]. At five points, the second test was carried out directly on the 
arboretum forest of the Faculty of Forestry, Universitas Gadjah Mada (UGM). 


3. RESULTS AND DISCUSSION 
3.1. GamaDet 

GamaDet is an integrated device that can record, store, and process bird sound data and display 
forest quality index. The GamaDet is designed based on a microcontroller connected to a microphone to 
record forest sounds. GamaDet is equipped with a tripod to increase portability so that measuring points in 
the forest can be easily reached. In terms of connectivity, GamaDet is fitted with a wifi network to simply 
access the user interface by connecting the user’s device (laptop or smartphone) to GamaDet’s IP address. 

GamaDet starts to work by recording environmental sounds from a certain point of view through a 
microphone. The recordings are stored in a secure digital (SD) card for further processing. The 
microprocessor will process the recording through a trained machine learning model when the recording is 
complete. The recordings are converted into a mel-spectrogram used by the model to identify forest quality 
and bird sounds. If the identification probability of a bird’s voice exceeds a specific limit, the identification 
result will be stored on the user’s computer. The analysis results are sent to the user’s computer via a wifi 
network. In the analysis results, both forest quality bioacoustic parameters and bird species are visualized on 
the user’s screen. 
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3.2. GamaNet software 

We also proposed GamaNet’s digital product based on Jupyter notebook, which can be accessed 
online on a cloud computing engine. Sound data recorded in the forest environment is used as the input. 
Then, a dashboard of forest quality index and detected bird species are displayed. The user interface of 
GamaNet is shown in Figure 4. GamaNet was created to make the tool more flexible to record other 
standard-compliant devices. 
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Figure 4. The user interface of GamaNet 


To use GamaNet, users need to prepare audio data recorded in the forest environment as input. 
GamaNet processes the audio recording based on the model that has been created. The model’s output is a 
comma-separated values (CSV) file that can be visualized easily. The processing results are displayed on the 
dashboard containing the forest quality index and the detected bird species. 

Data processing tools were tested using audio-recorded data from the forest arboretum, Faculty of 
Forestry, UGM, in October every hour for 60 seconds from 07:00 to 17:00. The test results show that 
GamaDet processes the 60 seconds recording data for 370 seconds to produce a forest quality index output. 
The results of the web version of GamaNet testing take a shorter time, i.e., only 70 seconds, to create the 
work in the form of a forest quality index. GamaNet can process data faster because GamaNet uses Google 
Collaboratory as a data processing server where the computational performance is faster than using the 
Nvidia Jetson Nano. 

The test of data processing sub-system used recorded audio data from the arboretum of the Faculty 
of Forestry UGM. The test results show that the recording has a time of 60 seconds when processed by 
GamaDet for 370 seconds to produce an output in the forest quality index. Meanwhile, the results of testing 
and processing using the GamaNet with the same data show the same results but need a shorter time. 
GamaNet needs only 70 seconds to produce the forest quality index. 

We conducted two data analyses, namely bioacoustic indices and sound analysis for bird 
identification. Figure 5 shows the bioacoustics indices and sound analysis results at the UGM arboretum. 
Figures 5(a) and 5(b) offer the AEI in a range of 0.142-0.531 and the BIO of the sound data [20], [30]. The 
system detected several animal/biophony sounds from 5.083 to 9.996. BIO can also indicate dawn and dusk 
singing times. H index can predict an animal’s appearance with a specific schedule. Figure 5(c) shows the 
H index and suggests that animals in the ecosystem do not have a particular schedule of appearances. An 
H index value above 0.5 indicates the occurrence of several animals quite often. Figure 5(d) shows the NDSI 
in a range of -0.610 to 0.207. A zero NDSI means that biophony has a value proportional to anthrophony. A 
negative NDSI indicates that biophony has a lower value than anthrophony. Overall, the arboretum has a 
diverse frequency of bird sounds. The bird identification is conducted based on the bird sounds recorded on 
the device. The device detected thirty-four bird sounds. This identification method has been successful and 
appropriate in identifying bird sounds. 
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Figure 5. The results of (a) AEI, (b) BIO, (c) H index, and (d) NDSI measurement at UGM arboretum 


4. CONCLUSION 

We successfully design a tropical forest quality assessment tool by recognizing the sound of birds in 
the forest. This tool consists of GamaDet and GamaNet. GamaDet is designed based on Nvidia Jetson Nano 
to acquire and record the forest environment’s sound, identify bird sounds, and assess forest quality index. 
The system needs 370 seconds to process the data with 60 seconds of audio data. Meanwhile, GamaNet is 
based on Google Collaboratory, succeeded in identifying bird sounds and assessing the quality of the audio 
recording provided with a processing time of 70 seconds on 60 seconds of audio data. 
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